Unser Geschäftsmodell ist das Betreiben einer Plattform (crowd-investing) bei der sich Personen die eine Geschäftsidee haben, aber nicht das benötigte Geld, anmelden und für ihr Projekt innerhalb einer vorgegebenen Zeit Geld sammeln können.
Auf der anderen Seite haben wir Geldgeber, die gern ihr Geld in Projekte anlegen möchten und nach Investitionen suchen.
Als Vermittler bringt unsere Plattform also Geldnehmer und Geldgeber zusammen.
Unsere Datenbasis ist die Historie eurer Plattform.
Getroffene Annahmen zu unserem Geschäftsmodell
Alle Projekte sind abgeschlossene Projekte, d.h. die Zeit, um für sein Projekt Geld zu sammeln ist abgelaufen. Unser Geschäftsmodell sieht es vor, dass die gesammelten Gelder ausgezahlt werden, auch wenn der Zielbetrag nicht erreicht wurde.
Wir verdienen unser Geld mit einer Provision für jedes Projekt was auf unserer Plattform landet. Wir nehmen an, dass wir in der Regel einen prozentualen Anteil bekommen und dass wir mit mehr Volumen aus den Projekten auch mehr Provision erhalten. D.h. das ein Mehr an Projekten oder teurere Projekte für uns von Vorteil sind.
Ziel ist daher die Geschäftserweiterung.
- funded_amount ... mit Ablauf der "Crowding"zeit erhaltener Betrag/ ausgezahlter Betrag in USD
- loan_amount ... Zielbetrag (Betrag dem man für das Projekt erreichen wollte) in USD
- activity ... Unterkategory zu dem das Ziel des Crowdprojektes thematisch gehört
- sector ... Oberkategory in den das Crowdprojektes Thema fällt
- use ... Kurzbeschreibung wofür das Geld verwendet werden soll
- country_code ... Ländercode nach ISO Norm
- country ... Ländername nach ISO Norm
- region ... Region
- currency ... Währung in den der funded_amount dann ausgezahlt wurde
- term in months ... Dauer über die der Kredit ausgezahlt werden soll
- lender_count ...Darlehensgeber (also wieviele Personen Geld für das Projekt gegeben haben)
- borrower_genders ... Geschlecht und Anzahl der Darlehensnehmer, also diejenigen die das Crowdprojekt initiiert haben
- repayment interval ... vertraglich vereinbarte Rückzahlungsmodalitäten/-rhythmus
import numpy as np
import pandas as pd
import plotly_express as px
# für Dashboard
from dash import Dash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output
from dash import no_update
# misc
import re
# Wichtig, wenn das Notebook exportiert wird in HTML, dann werden die Grafiken auch im HTML angezeigt
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
# recommened behaviour for pandas. Avoids warning.
pd.options.mode.copy_on_write = True
df = pd.read_csv("data_abschlussprojekt.csv", engine='python', nrows=2)
df
| # funded_amount# loan_amount# activity# sector# use# country_code# country# region# currency# term_in_months# lender_count# borrower_genders# repayment_interval | |
|---|---|
| 0#300.0#300.0#Fruits & Vegetables#Food#To buy seasonal | fresh fruits to sell. #PK#Pakistan#Lahore#PKR... |
| 1#575.0#575.0#Rickshaw#Transportation#to repair and maintain the auto rickshaw used in their business.#PK#Pakistan#Lahore#PKR#11.0#14#female | female#irregular |
df.columns
Index(['# funded_amount# loan_amount# activity# sector# use# country_code# country# region# currency# term_in_months# lender_count# borrower_genders# repayment_interval'], dtype='object')
df = pd.read_csv("data_abschlussprojekt.csv",
sep='#',
engine="python",
skipinitialspace=True,
index_col=0)
df
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 300.0 | 300.0 | Fruits & Vegetables | Food | To buy seasonal, fresh fruits to sell. | PK | Pakistan | Lahore | PKR | 12.0 | 12 | female | irregular |
| 1 | 575.0 | 575.0 | Rickshaw | Transportation | to repair and maintain the auto rickshaw used ... | PK | Pakistan | Lahore | PKR | 11.0 | 14 | female, female | irregular |
| 2 | 150.0 | 150.0 | Transportation | Transportation | To repair their old cycle-van and buy another ... | IN | India | Maynaguri | INR | 43.0 | 6 | female | bullet |
| 3 | 200.0 | 200.0 | Embroidery | Arts | to purchase an embroidery machine and a variet... | PK | Pakistan | Lahore | PKR | 11.0 | 8 | female | irregular |
| 4 | 400.0 | 400.0 | Milk Sales | Food | to purchase one buffalo. | PK | Pakistan | Abdul Hakeem | PKR | 14.0 | 16 | female | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671200 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'para compara: cemento, arenya y ladri... | PY | Paraguay | Concepción | USD | 13.0 | 0 | female | monthly |
| 671201 | 25.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 1 | female | monthly |
| 671202 | 0.0 | 25.0 | Games | Entertainment | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671203 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
| 671204 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
671205 rows × 13 columns
#### Beobachtung: Problem: "[True, u'to start a turducken farm.']"
df.columns # Spalten
Index(['funded_amount', 'loan_amount', 'activity', 'sector', 'use',
'country_code', 'country', 'region', 'currency', 'term_in_months',
'lender_count', 'borrower_genders', 'repayment_interval'],
dtype='object')
print(df.info()) # DatenTypen
<class 'pandas.core.frame.DataFrame'> Index: 671205 entries, 0 to 671204 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 funded_amount 671205 non-null float64 1 loan_amount 671205 non-null float64 2 activity 671205 non-null object 3 sector 671205 non-null object 4 use 666972 non-null object 5 country_code 671197 non-null object 6 country 671205 non-null object 7 region 614405 non-null object 8 currency 671205 non-null object 9 term_in_months 671205 non-null float64 10 lender_count 671205 non-null int64 11 borrower_genders 666984 non-null object 12 repayment_interval 671205 non-null object dtypes: float64(3), int64(1), object(9) memory usage: 71.7+ MB None
# wieviel unique values haben die object spalten
cols = df.columns
for col in cols:
print(f"{col}: ", df[col].nunique())
funded_amount: 610 loan_amount: 479 activity: 163 sector: 15 use: 423452 country_code: 86 country: 87 region: 12695 currency: 67 term_in_months: 148 lender_count: 503 borrower_genders: 11298 repayment_interval: 4
# potentielle categories
# gibt es verdächtige strings?
cols = ["sector", "country_code", "currency", "repayment_interval"]
# lists zu lang: "activity", "country"
for col in cols:
strings = []
for element in df[col].unique():
strings.append(element)
# Ausgabe der wirklichen Strings
print(f"{col}: {strings}")
sector: ['Food', 'Transportation', 'Arts', 'Services', 'Agriculture', 'Manufacturing', 'Wholesale', 'Retail', 'Clothing', 'Construction', 'Health', 'Education', 'Personal Use', 'Housing', 'Entertainment'] country_code: ['PK', 'IN', 'KE', 'NI', 'SV', 'TZ', 'PH', 'PE', 'SN', 'KH', 'LR', 'VN', 'IQ', 'HN', 'PS', 'MN', 'US', 'ML', 'CO', 'TJ', 'GT', 'EC', 'BO', 'YE', 'GH', 'SL', 'HT', 'CL', 'JO', 'UG', 'BI', 'BF', 'TL', 'ID', 'GE', 'UA', 'XK', 'AL', 'CD', 'CR', 'SO', 'ZW', 'CM', 'TR', 'AZ', 'DO', 'BR', 'MX', 'KG', 'AM', 'PY', 'LB', 'WS', 'IL', 'RW', 'ZM', 'NP', 'CG', 'MZ', 'ZA', 'TG', 'BJ', 'BZ', 'SR', 'TH', 'NG', 'MR', 'VU', 'PA', 'VI', 'VC', 'LA', 'MW', 'MM', 'MD', 'SS', 'SB', 'CN', 'EG', 'GU', 'AF', 'MG', nan, 'PR', 'LS', 'CI', 'BT'] currency: ['PKR', 'INR', 'KES', 'NIO', 'USD', 'TZS', 'PHP', 'PEN', 'XOF', 'LRD', 'VND', 'HNL', 'MNT', 'COP', 'GTQ', 'TJS', 'BOB', 'YER', 'KHR', 'GHS', 'SLL', 'HTG', 'CLP', 'JOD', 'UGX', 'BIF', 'IDR', 'GEL', 'UAH', 'EUR', 'ALL', 'CRC', 'XAF', 'TRY', 'AZN', 'DOP', 'BRL', 'MXN', 'KGS', 'AMD', 'PYG', 'LBP', 'WST', 'ILS', 'RWF', 'ZMW', 'NPR', 'MZN', 'ZAR', 'BZD', 'SRD', 'NGN', 'VUV', 'XCD', 'MWK', 'LAK', 'MMK', 'ZWD', 'MDL', 'SSP', 'SBD', 'CNY', 'EGP', 'MGA', 'NAD', 'LSL', 'THB'] repayment_interval: ['irregular', 'bullet', 'monthly', 'weekly']
df['activity'] = df['activity'].astype("category")
df['sector'] = df['sector'].astype("category")
df['use'] = df['use'].astype("string")
df['country_code'] = df['country_code'].astype("category")
df['country'] = df['country'].astype("category")
df['currency'] = df['currency'].astype("category")
df['region'] = df['region'].astype("string")
df['borrower_genders'] = df['borrower_genders'].astype("string")
df['repayment_interval'] = df['repayment_interval'].astype("category")
df.dtypes
funded_amount float64 loan_amount float64 activity category sector category use string[python] country_code category country category region string[python] currency category term_in_months float64 lender_count int64 borrower_genders string[python] repayment_interval category dtype: object
df_duplicated = df[df.duplicated(keep=False)]
df_duplicated
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 327 | 275.0 | 275.0 | Farming | Agriculture | to buy fertilizers and other farm supplies. | PH | Philippines | Brookes Point, Palawan | PHP | 8.0 | 8 | female | irregular |
| 392 | 100.0 | 100.0 | Home Energy | Personal Use | to buy a solar lamp. | SV | El Salvador | <NA> | USD | 14.0 | 4 | male | monthly |
| 405 | 100.0 | 100.0 | Home Energy | Personal Use | to buy a solar-powered lamp. | SV | El Salvador | <NA> | USD | 14.0 | 4 | male | monthly |
| 498 | 100.0 | 100.0 | Home Energy | Personal Use | to buy a solar-powered lamp. | SV | El Salvador | <NA> | USD | 14.0 | 4 | male | monthly |
| 606 | 100.0 | 100.0 | Home Energy | Personal Use | to buy a solar-powered lamp. | SV | El Salvador | <NA> | USD | 14.0 | 4 | male | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671200 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'para compara: cemento, arenya y ladri... | PY | Paraguay | Concepción | USD | 13.0 | 0 | female | monthly |
| 671201 | 25.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | <NA> | KES | 13.0 | 1 | female | monthly |
| 671202 | 0.0 | 25.0 | Games | Entertainment | <NA> | KE | Kenya | <NA> | KES | 13.0 | 0 | <NA> | monthly |
| 671203 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | <NA> | KES | 13.0 | 0 | female | monthly |
| 671204 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | <NA> | KES | 13.0 | 0 | female | monthly |
34930 rows × 13 columns
1) fehlende Werte in 'use' und 'borrower_genders' verwefen,
2) fehlende Werte (8) im countrycode ersetzen unter Zuhilfenahme der Information durch die Spalte 'region'
3) ersetze 'region' NaNs mit einem Synonym z.B. "not specified"
df.isna().sum() # NaNs overview
funded_amount 0 loan_amount 0 activity 0 sector 0 use 4233 country_code 8 country 0 region 56800 currency 0 term_in_months 0 lender_count 0 borrower_genders 4221 repayment_interval 0 dtype: int64
Begründung:
df.loc[df["use"].isna(), :] # NaNs in 'use' auflisten
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 140 | 2975.0 | 2975.0 | Food Production/Sales | Food | <NA> | TZ | Tanzania | <NA> | TZS | 10.0 | 110 | <NA> | monthly |
| 145 | 1200.0 | 1200.0 | Personal Expenses | Personal Use | <NA> | PE | Peru | <NA> | PEN | 20.0 | 44 | <NA> | monthly |
| 170 | 4250.0 | 4250.0 | Catering | Food | <NA> | TZ | Tanzania | <NA> | TZS | 10.0 | 116 | <NA> | monthly |
| 412 | 2350.0 | 2350.0 | Beauty Salon | Services | <NA> | TZ | Tanzania | <NA> | TZS | 10.0 | 75 | <NA> | monthly |
| 414 | 725.0 | 725.0 | Agriculture | Agriculture | <NA> | SV | El Salvador | <NA> | USD | 20.0 | 19 | <NA> | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671151 | 0.0 | 25.0 | Livestock | Agriculture | <NA> | KE | Kenya | <NA> | KES | 13.0 | 0 | <NA> | monthly |
| 671174 | 0.0 | 25.0 | Games | Entertainment | <NA> | KE | Kenya | <NA> | KES | 13.0 | 0 | <NA> | monthly |
| 671178 | 0.0 | 25.0 | Livestock | Agriculture | <NA> | KE | Kenya | <NA> | KES | 13.0 | 0 | <NA> | monthly |
| 671185 | 0.0 | 25.0 | Livestock | Agriculture | <NA> | KE | Kenya | <NA> | KES | 13.0 | 0 | <NA> | monthly |
| 671202 | 0.0 | 25.0 | Games | Entertainment | <NA> | KE | Kenya | <NA> | KES | 13.0 | 0 | <NA> | monthly |
4233 rows × 13 columns
# Anzahl der NaNs in 'borrower_gender' & 'use' sharing the same row indices? 3888
(df['use'].isna() & df['borrower_genders'].isna()).value_counts()
False 666984 True 4221 Name: count, dtype: int64
# Anzahl der NaNs in 'borrower_gender' & 'use' & 'region' sharing the same row indices? Skewing? spezielles land betroffen? 3888
dfisna = df.loc[(df['use'].isna() & df['borrower_genders'].isna()
& df['region'].isna()), :]
dfisna
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 140 | 2975.0 | 2975.0 | Food Production/Sales | Food | <NA> | TZ | Tanzania | <NA> | TZS | 10.0 | 110 | <NA> | monthly |
| 145 | 1200.0 | 1200.0 | Personal Expenses | Personal Use | <NA> | PE | Peru | <NA> | PEN | 20.0 | 44 | <NA> | monthly |
| 170 | 4250.0 | 4250.0 | Catering | Food | <NA> | TZ | Tanzania | <NA> | TZS | 10.0 | 116 | <NA> | monthly |
| 412 | 2350.0 | 2350.0 | Beauty Salon | Services | <NA> | TZ | Tanzania | <NA> | TZS | 10.0 | 75 | <NA> | monthly |
| 414 | 725.0 | 725.0 | Agriculture | Agriculture | <NA> | SV | El Salvador | <NA> | USD | 20.0 | 19 | <NA> | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671151 | 0.0 | 25.0 | Livestock | Agriculture | <NA> | KE | Kenya | <NA> | KES | 13.0 | 0 | <NA> | monthly |
| 671174 | 0.0 | 25.0 | Games | Entertainment | <NA> | KE | Kenya | <NA> | KES | 13.0 | 0 | <NA> | monthly |
| 671178 | 0.0 | 25.0 | Livestock | Agriculture | <NA> | KE | Kenya | <NA> | KES | 13.0 | 0 | <NA> | monthly |
| 671185 | 0.0 | 25.0 | Livestock | Agriculture | <NA> | KE | Kenya | <NA> | KES | 13.0 | 0 | <NA> | monthly |
| 671202 | 0.0 | 25.0 | Games | Entertainment | <NA> | KE | Kenya | <NA> | KES | 13.0 | 0 | <NA> | monthly |
4221 rows × 13 columns
dfisna['repayment_interval'].unique()
['monthly', 'bullet', 'irregular'] Categories (4, object): ['bullet', 'irregular', 'monthly', 'weekly']
dfisna['region'].unique()
<StringArray> [<NA>] Length: 1, dtype: string
df = df.dropna(subset=['use']) # Löschen von Zeilen, in denen "use" NaN ist
# Löschen von Zeilen, in denen "borrower_genders" NaN ist
df = df.dropna(subset=['borrower_genders'])
Begründung:
# alle NaNs in 'countrycode' sind in 'country' Namibia (countrycode=Na)
dfisna2 = df[df['country_code'].isna()]
dfisna2
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 202537 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 6.0 | 162 | female | bullet |
| 202823 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | Rundu | NAD | 6.0 | 159 | male | bullet |
| 344929 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 7.0 | 120 | female | bullet |
| 351177 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | Rundu | NAD | 7.0 | 126 | male | bullet |
| 420953 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 7.0 | 118 | female | bullet |
| 421218 | 4000.0 | 4000.0 | Wholesale | Wholesale | purchase solar lighting products for sale to l... | NaN | Namibia | Rundu | NAD | 7.0 | 150 | male | bullet |
| 487207 | 5100.0 | 5100.0 | Renewable Energy Products | Retail | to pay for stock of solar lights and cell phon... | NaN | Namibia | Katima Mulilo | NAD | 7.0 | 183 | male | bullet |
| 487653 | 5000.0 | 5000.0 | Wholesale | Wholesale | to maintain a stock of solar lights and cell p... | NaN | Namibia | Oshakati | NAD | 7.0 | 183 | female | bullet |
df['country_code'] = df['country_code'].cat.add_categories(
"Na") # neue Kategorie "Na" zu Category 'Country_code'
df.loc[:, 'country_code'] = df['country_code'].fillna("Na") # "Na" eintragen
na_country = df[df['country_code'] == "Na"]
na_country
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 202537 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | Na | Namibia | EEnhana | NAD | 6.0 | 162 | female | bullet |
| 202823 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | Na | Namibia | Rundu | NAD | 6.0 | 159 | male | bullet |
| 344929 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | Na | Namibia | EEnhana | NAD | 7.0 | 120 | female | bullet |
| 351177 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | Na | Namibia | Rundu | NAD | 7.0 | 126 | male | bullet |
| 420953 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | Na | Namibia | EEnhana | NAD | 7.0 | 118 | female | bullet |
| 421218 | 4000.0 | 4000.0 | Wholesale | Wholesale | purchase solar lighting products for sale to l... | Na | Namibia | Rundu | NAD | 7.0 | 150 | male | bullet |
| 487207 | 5100.0 | 5100.0 | Renewable Energy Products | Retail | to pay for stock of solar lights and cell phon... | Na | Namibia | Katima Mulilo | NAD | 7.0 | 183 | male | bullet |
| 487653 | 5000.0 | 5000.0 | Wholesale | Wholesale | to maintain a stock of solar lights and cell p... | Na | Namibia | Oshakati | NAD | 7.0 | 183 | female | bullet |
Begründung:
df.loc[:, 'region'] = df['region'].fillna(
"Not specified") # Auffuellen mit "Not specified"
df[df['region'] == "Not specified"] # check
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | 250.0 | 250.0 | Services | Services | purchase leather for my business using ksh 20000. | KE | Kenya | Not specified | KES | 4.0 | 6 | female | irregular |
| 49 | 450.0 | 450.0 | General Store | Retail | to stock his store. | SV | El Salvador | Not specified | USD | 14.0 | 18 | male | monthly |
| 54 | 225.0 | 225.0 | Food Market | Food | to purchase various seasonal items to resell: ... | SN | Senegal | Not specified | XOF | 14.0 | 7 | female | monthly |
| 67 | 125.0 | 125.0 | Energy | Services | purchase solar lanterns for resale. | KE | Kenya | Not specified | KES | 3.0 | 6 | male | irregular |
| 70 | 2000.0 | 2000.0 | Retail | Retail | to install a display window and a sunshade for... | IQ | Iraq | Not specified | USD | 15.0 | 71 | male | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671194 | 0.0 | 25.0 | Livestock | Agriculture | Kiva Coordinator fixed issue loan (no longer v... | KE | Kenya | Not specified | KES | 13.0 | 0 | female, female | monthly |
| 671197 | 0.0 | 25.0 | Livestock | Agriculture | Pretend the issue with loan got addressed by K... | KE | Kenya | Not specified | KES | 13.0 | 0 | female | monthly |
| 671201 | 25.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | Not specified | KES | 13.0 | 1 | female | monthly |
| 671203 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | Not specified | KES | 13.0 | 0 | female | monthly |
| 671204 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | Not specified | KES | 13.0 | 0 | female | monthly |
52573 rows × 13 columns
df.describe()
| funded_amount | loan_amount | term_in_months | lender_count | |
|---|---|---|---|---|
| count | 666972.000000 | 666972.000000 | 666972.00000 | 666972.000000 |
| mean | 785.131835 | 840.272905 | 13.73022 | 20.551025 |
| std | 1128.005848 | 1187.875622 | 8.59619 | 28.366363 |
| min | 0.000000 | 25.000000 | 1.00000 | 0.000000 |
| 25% | 250.000000 | 275.000000 | 8.00000 | 7.000000 |
| 50% | 450.000000 | 500.000000 | 13.00000 | 13.000000 |
| 75% | 900.000000 | 1000.000000 | 14.00000 | 24.000000 |
| max | 100000.000000 | 100000.000000 | 158.00000 | 2986.000000 |
df[df.funded_amount == 100000]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 70499 | 100000.0 | 100000.0 | Agriculture | Agriculture | create more than 300 jobs for women and farmer... | HT | Haiti | Les Cayes | USD | 75.0 | 2986 | female | irregular |
# Commented as it slows down notebook
# fig12 = px.box(df.funded_amount, title='Extremwerte in funded_amount?')
# fig12.update_layout(yaxis_type="log")
# fig12.show()
df[df.loan_amount == 100000]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 70499 | 100000.0 | 100000.0 | Agriculture | Agriculture | create more than 300 jobs for women and farmer... | HT | Haiti | Les Cayes | USD | 75.0 | 2986 | female | irregular |
fig13 = px.box(df.loan_amount, title='Extremwerte in loan_amount?')
fig13.update_layout(yaxis_type="log")
fig13.show()